Efficient Skyline Computation in MapReduce

نویسندگان

  • Kasper Mullesgaard
  • Jens Laurits Pederseny
  • Hua Lu
  • Yongluan Zhou
چکیده

Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full advantage of parallelism but still run significant parts serially. In this paper, we propose a novel approach to compute skylines efficiently in MapReduce. We design a grid partitioning scheme to divide the data space into partitions, and employ a bitstring to represent the partitions. The bitstring is efficiently obtained in MapReduce, and it clearly helps prune partitions (and tuples) that cannot have skyline tuples. Based on the grid partitioning, we propose two MapReduce algorithms to compute skylines. Both algorithms utilize the bitstring and distribute the original tuples to multiple mappers and make use of them to compute local skylines in parallel. In particular, MapReduce Grid Partitioning based Single-Reducer Skyline Computation (MR-GPSRS) employs a single reducer to assemble the local skylines appropriately to compute the global skyline. In contrast, MapReduce Grid Partitioning based Multiple Reducer Skyline Computation (MR-GPMRS) further divides local skylines and distributes them to multiple reducers that compute the global skyline in an independent and parallel manner. The proposed algorithms are evaluated through extensive experiments, and the results show that MR-GPMRS significantly outperforms the alternatives in various settings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce

The skyline operator and its variants such as dynamic skyline and reverse skyline operators have attracted considerable attention recently due to their broad applications. However, computations of such operators are challenging today since there is an increasing trend of applications to deal with big data. For such data-intensive applications, the MapReduce framework has been widely used recent...

متن کامل

Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments

This paper addresses the problem of skyline computation under the MapReduce framework. As a parallel programming model for data-intensive computing applications, MapReduce runs on a cluster of commercial PCs with the main idea of task decomposition and result reduction. Based on different data partitioning strategies, three MapReduce style skyline computation algorithms are developed: MapReduce...

متن کامل

Efficient Skyline Computation for Large Volume Data in MapReduce Utilising Multiple Reducers

A skyline query is useful for extracting a complete set of interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of backends are switching from single node environments to cluster oriented setups. Previous work has presented ways to run the skyline query in these setups using the MapReduce framework, but th...

متن کامل

Simultaneous Processing of Multi-Skyline Queries with MapReduce

With rapid increase of the number of applications as well as the sizes of data, multi-query processing on the MapReduce framework has gained much attention. Meanwhile, there have been much interest in skyline query processing due to its power of multi-criteria decision making and analysis. Recently, there have been attempts to optimize multi-query processing in MapReduce. However, they are not ...

متن کامل

Skyline recommendation in distributed networks

Skyline recommendation technology has recently received a lot of attention in the database community. However, the existing works mostly focus on how to obtain skyline objects from fine-grained data in centralized environments. And the time cost of skyline recommendation will increase exponentially as the number of data and skyline recommendation instructions increases, which will seriously inf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014